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1. INTRODUCTION 

Globally, mobile phone usage and voice and data traffic are surging. Recent figures suggest that 
global growth will accelerate [1]. Like electricity, wireless communication is increasingly essential for most 
social, economic, and industrial activities. Due to this global trend, demand for faster data rates will grow, 
requiring more radio spectrum use. However, the radio spectrum is a limited natural resource controlled by 
National Regulatory Authorities (NRA) [2]. As spectrum use rises, a shortage appears and this is traceable to 
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the inherent shortcomings of the conventional fixed spectrum access (FSA) used in most countries [3]. Based 
on an analysis of spectrum utilization and coverage internationally, the FSA policy won't be able to 
accommodate the growth of mobile connection and increased data transmission speeds in the coming years. 
To enhance QoS and user experience, more comprehensive and scalable spectrum access is needed, enabling 
users of more crowded channels to utilize available and less occupied channels seamlessly. Dynamic 
spectrum access (DSA) is a flexible spectrum policy linked with the IEEE 802.22 standard, and cognitive 
radio (CR) is required for DSA implementation. 

Based on the CR definition as detailed in [4]-[7], a transceiver in a CR system can automatically 
identify the available spectrum and then use the vacant channels while skipping the occupied ones. It 
optimizes limited radio resources while causing the least amount of disturbance to main and secondary users, 
while DSA frees up idle capacity in occupied but underutilized bands such as TV white space [8]. While 
other functions play their roles, spectrum sensing is the most essential and remains the most fundamental 
component of CR's operation [9]. Several traditional approaches or algorithms for CR spectrum sensing are 
well-documented and frequently used [10], while machine learning (ML) algorithms are cutting-edge ways to 
improve CR system performance. They use the classification concept to detect the availability of frequency 
channels [11]. Automatic modulation recognition (AMR)-based spectrum sensing has gained scientific 
attention in recent years. It's an automated approach for recognizing signals' modulation classification and 
features [12], based on the concept that primary users (PUs) use a defined modulation technique for 
transmission within a given frequency channel. The absence of almost any modulation scheme in the channel 
means it's free and safe for transmission by a secondary user (SU) [13]. 

A wide variety of AMR techniques for spectrum sensing have been developed in literature and are 
classified into two major categories: (i) likelihood-based (LB) and (ii) feature-based (FB) techniques. LB 
approaches use hypothesis testing theory, and even though the performance is adjudged optimal, they are 
prone to high computation complexity. FB methods were created for practical application and typically 
extract features after preprocessing, employing classifiers to accomplish modulation classification. Various 
feature parameters could also be utilized to distinguish between multiple digital signals [14], [15]. The FB 
technique is further subdivided into shallow and deep learning techniques [12]. Although shallow machine 
learning-based classifiers have been used successfully, manual feature engineering relies on professional 
expertise, which may impair performance. Deep learning-based techniques for AMR have been presented due 
to their essential self-learning capabilities, especially when presented with an unfamiliar environment [16]. 

Interestingly, important studies on Feature-based AMR spectrum sensing have been documented in 
the literature. However, the majority of the studies that investigated AMR for spectrum sensing in CR used a 
variety of simulated datasets and feature types, such as constellation shapes, pseudo wigner-ville distribution 
(PWVD) coefficients, fractional lower-order statistics, and higher-order statistics [17]-[24], with only a few 
reports on results based on real datasets [13], [25]. Simulated datasets are not subjected to signal degradation 
effects, which normally occur in real-time wireless communication scenarios. Thus, models that are based on 
such datasets will have limited performance in real-time deployment. In addition, the use of complex feature 
extraction techniques will attract substantial computational costs. 

In the present study, real-time over-the-air radio frequency (RF) datasets were collected, curated, 
and non-complex first-order statistical characteristics were used to create an AMR model. As a first step 
toward adding opportunistic spectrum sensing to the recently developed nomadic base transceiver station 
(NomadicBTS), a new base station architecture based on software-defined radio (SDR) technology for CR 
applications, this study describes the use of real-time over-the-air digital RF data for the development of a 
digital spectrum sensing model based on the automatic modulation classification (AMC), while exploring 
selected digital modulations. 


2. METHOD 

The nomadic base transceiver station (NomadicBTS) proposed in [26] is designed and built 
essentially on the software defined radio (SDR). The NomadicBTS architecture has two vital sub-modules 
with the front-end housing the SDR hardware while the SDR software operates on a personal computer (PC) 
at the back-end [26]. The architecture was extended in our study reported in [27] by incorporating CR 
capability with the AMR-based spectrum sensing model in the NomadicBTS architecture, where four (4) 
modulation schemes were considered and employed, namely, amplitude modulation (AM), Gaussian 
minimum shift key (GMSK), frequency modulation (FM) and (iv) noise (no-modulation). 

In the current study, however, the following modulation schemes were evaluated to further advance 
the AMR model in the NomadicBTS architecture for real-time over-the-air digital RF signals: (i) quadrature 
phase shift keying (QPSK), (ji) Gaussian minimum shift keying (GMSK), (iii) binary phase shift keying 
(BPSK), (iv) eight-ary phase shift keying (8PSK), (v) 16-quadrature amplitude modulation (16-QAM), and 
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(vi) 64-quadrature amplitude modulation (64-QAM). A no-modulation signal (noise) was also incorporated 
to depict spectrum possibilities or gaps in a real-world situation. As seen in Figure 1, the model in this paper 
is classified into seven categories (modulation plus no-modulation). With the implementation of this model in 
the NomadicBTS architecture [26], [27] for practical deployment, it will be able to differentiate between 
occupied and vacant spectrum bands. It will also indicate the type of modulation scheme for an appropriate 
choice of demodulation algorithm, which enables adaptability across wireless standards in an SDR scenario. 
This is crucial given the widespread usage of SDRs in modern wireless communication systems alongside 
satellite, spectrum sensing, and cellular systems [26]-[29]. The following sections describe the phases 
involved in implementing the AMR model in this study. If any modulation scheme associated with any 2G to 
4G communication technology is detected, it indicates the occupied states of the spectrum band. If, however, 
only the noise is detected, it indicates the availability (free) state of the spectrum band. 
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Figure 1. Flowchart for AMC-based spectrum sensing model development 


2.1. Real-time RF data acquisition 

For this study, raw RF signals for frequencies matching the specified modulation schemes were 
recorded between 2G to 4G cellular standards and WiFi (see Table 1). The data acquisition campaign was 
carried out at Covenant University, Ota, Ogun State, Nigeria (Figure 2), a Smart Campus with coverage for 
all the itemized wireless standards in Table 1. The real-time RF dataset was obtained from the setup 
comprising the Universal Software Radio Peripheral (USRP B200) as the hardware and the GNU-Radio 
Companion (GRC) as the software configured on Ubuntu Linux 16.04 LTS as the operating system 
(OS). This setup allows the USRP to effectively communicate with the host computer as shown in Figure 3. 
The technical and operating parameters of the USRP B200 are detailed in [27], while the parameter 
configurations for the different modulation schemes in this study are presented in Table 2. Each class had 50 
real-time signals gathered, resulting in a total of 350 samples. 


Table 1. Mobile technologies and respective modulation schemes [30] 


Communications technology Wireless Centre frequency (MHz) Modulation scheme 
generation 
Global System for Mobile Communications (GSM) 2G 900, 1800 GMSK 
General Packet Radio Service (GPRS) 2.5G 900, 1800 GMSK 
Enhanced Data Rate for Global Evolution (EDGE) 2.75G 900, 1800 8PSK 
Universal Mobile Telecommunications System (UMTS) 3G 900, 2100 QPSK 
High Speed Packet Access (HSPA) 3.5G 2100 QPSK, 16-QAM 
Long Term Evolution (LTE) 4G 700, 800, 1800, 2300, 2600 QPSK, 16-QAM, 64-QAM 
Wireless Fidelity (Wi-Fi /WLAN) 2400, 5000 BPSK, QPSK, 16-QAM, 


64-QAM 
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Figure 2. Location of one of the base stations for GSM data acquisition (latitude 6.6658° N, longitude 
3.1588° E) 


Figure 3. Interconnection of USRP B200 with host PC for real data acquisition campaign 


Table 2. Parameter configurations for the modulation schemes in this study 
Modulation Wireless standard Bandwidth Operator Downlink frequency range (MHz) Centre frequency 


scheme (kHz) (MHz) 
GMSK GSM - 2G 200 Globacom 945-950 947.5 
GPRS — 2.5G MTN 950-955 952.5 
8PSK EDGE — 2.75G 200 Airtel 955-960 957.5 
QPSK UMTS - 3G 5 MTN 2,110-2,120 2115 
LTE-4G 


Although BPSK, 16 QAM, and 64 QAM, as well as QPSK are deployed in WiFi, they differ in 
terms of received signal strength indicator (RSSI) sensitivity and data rate. Table 3 shows the theoretical data 
ranges and minimum RSSI sensitivities for each of the modulation schemes in WiFi. The dataset for this 
study is available at the Advanced Signal Processing and Machine Intelligence Research (ASPMIR) 
Laboratory, Covenant University, Ota, Nigeria. To carry out an accurate signal acquisition for each of the 
WiFi modulation schemes during our campaign, strategic locations within the Covenant University campus 
were selected through the use of Network Signal Information Pro mobile application software. The software 
interface as shown in Figure 4 shows the parameters in Table 3 and other information about any WiFi access 
point (AP) that is enabled and connected. Real signals were captured at the location where the network 
information corresponded with the data rate range and minimum RSSI of any of the modulation schemes, as 
detailed in Table 3. 


Table 3. Theoretical data rates and minimum RSSI sensitivities for the WiFi modulation schemes 


Modulation Theoretical data rate RSSI (dBm) for 20- RSSI (dBm) for 40-MHz 
scheme (Mbps) MHz channel BW channel BW 
BPSK 6.50 — 7.20 -82 -79 

QPSK 13.00 — 21.70 -79 to -77 -76 to -74 
16-QAM 26.00 — 43.30 -74 to -70 -71 to -67 
64-QAM 52.00 — 72.20 -66 to -64 -63 to -61 
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Figure 4. Network signal info pro with WiFi parameters 


2.2. Data preprocessing 

Each signal acquired at a particular frequency was digitized by the USRP B200 circuitry, which was 
used to realize the RF front-end of the NomadicBTS architecture. The received signal passes through 
different stages at the front end, such as: (i) filtering, (i) down-conversion, (iii) signal conditioning, (iv) 
analog-to-digital conversion (ADC), and (v) digital signal processing (DSP) to produce the digitized format 
of the RF signal. The digitized signals were stored as .dat files in the GRC flow graph. Each of these binary 
files was further preprocessed into a vector of float numbers. Algorithm below shows data conversion 
algorithm, which outlines the procedure for converting each .dat file into a float vector. The algorithm was 
implemented with MATLAB R2017a. 


Input: K, M 

K is the number of .dat files for each modulation class 

M is the number of modulation classes, each being represented by a separate directory 
Output: K 


K represents the number of saved .mat files representing the corresponding .dat files 


(1) Clear the workspace 

(2) for alli=1, 2, ....., M do 

(3) Enable directory containing the .dat files to be opened 
(4) for all j= 1, 2, .......K do 


(5) Use fopen(‘FileName’) to get file identifier f_id 

(6) If (f id >= 3) 

(7) Declare a float vector X 

(8) Use fread(fid) to extract the file in vector form stored in X 

(9) Save the data vector X as a MATLAB file with .mat extension 

(10) Else 

(11) Check the current directory and change it to the directory containing the desired 


file OR save the desired file into the directory. 
(12) endif 
(13) end for 
(14) end for 
(15) Stop 


2.3. First-order statistical features 

In this work, First-order statistics (FOS) values were retrieved for all preprocessed signal samples. 
For the following reasons, FOS features were considered and employed in this study: (i) the ability to identify 
distinctive attributes of signals, (ii) the awareness of signal modulation types, (iii) lack of sensitivity to 
variations in signal-to-noise ratio (SNR), and (iv) the significantly lower complexity compared to higher- 
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order statistics based on our ultimate goal of achieving an on-device deployment of the AMR model [31]. 
The statistical parameters used are mean, variance, standard deviation, kurtosis, skewness, root mean square 
(RMS), median, and entropy, with mathematical details in [32]. The algorithm for the feature extraction 
procedure in this study is detailed in Algorithm below shows feature extraction algorithm, and its 
implementation was carried out in the MATLAB 2017a environment. 


Input: K, M 
K is the number of .mat files saved for each modulation class. 
M is the number of directories where the .mat files are saved. Each directory represents a 
modulation class. 

Output: The features to be extracted from each .mat file. 


(1) Clear the workspace 
(2) for alli=1, 2, ...... M do 
(3) Enable directory containing the -mat files to be loaded 


(4) for allj = 1, 2, ......,K do 

(5) If (mat file = “saved™) 

(6) Load the .mat file in the workspace to enable vector X 

(m) Get the mean feature of X using mean(X) 

(8) Get the variance feature of X using var(X) 

(9) Get the standard deviation feature of X usimg std) 

(10) Get the skewness feature of X using skewness(X) 

al Get the kurtosis feature of X using kurtosis(X) 

(12) Get the root mean square feature of X using rms(X) 

(13) Get the entropy feature of X using entropy (X) 

(14) Get the median feature of X using median(X) 

(15) Save these features for the sample in Excel spreadsheet 

(16) Else 

(17) Repeat the steps outlined in Algorithm 1 and save the extracted float file as -mat 
file in the appropriate directory. 

(185) end if 

(19) end for 

(20) end for 

(21) Stop 


2.4. Development of a classification model 

This involves building and training classification model configurations to differentiate between the 
seven classes, which are the six modulation schemes and a no-modulation output as noise. Multiple 
experiments were conducted utilizing two classification models: kernel-based SVM and multilayer 
perceptron ANN (MLP-ANN). The experiments employed the following MLP-ANN model specifications: 

a. Architecture type: a feed-forward MLP-ANN with an input layer of 8 neurons showing characteristics, 
experimentally varying hidden layer neurons, and an output layer of 7 neurons for 7 modulation classes [13]. 

b. Activation functions: in the input layer, a linear activation function, i.e., Purelin, was utilized. In addition, 
to incorporate non-linearity into the network, the bipolar sigmoidal function, i.e., the Tan-Sigmoid 
function, was applied to both the hidden layer and the output layer [13]. 

c. Learning algorithms: to train the model, two variants of back propagation algorithms were employed: the 
Levenberg-Marquardt (LM) and the scaled conjugate gradient (SCG). They were chosen based on their 
training speed, efficiency, stability, and superior accuracy [13], [29]. 

d. Performance functions: to evaluate training performance, the mean square error (MSE) and accuracy were 
utilized. 


3. RESULTS AND DISCUSSION 

For the development of the AMR model, different classification model configurations were used to 
differentiate between the seven classes. Several experiments were carried out to find the best model for this 
objective using the kernel-based SVM and the MLP-ANN classification models. For the type of architecture, 
a feed-forward MLP-ANN was set up, and Purelin was utilized as the linear activation function. Non- 
linearity was incorporated into the network, and the Tan-Sigmoid function was applied to the hidden and 
output layers. The LM and SCG algorithms were employed for the training of the model. The performance 
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was evaluated using the mean square error (MSE), while the accuracy was established using the confusion 
matrix and the popular receiver operator characteristics (ROC) methodology. Figures 5(a)-(g) depict some 


spectrum graphs as samples of the raw RF signals obtained. These charts were developed and presented on 
the FFT sink in the GRC environment. 
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Figure 5. Spectrum plots for (a) sample of GMSK signals, (b) sample of 8PSK signals, (c) sample of QPSK 
signals (d) sample of BPSK signals (e) sample of 16 QAM signals (f) sample of 64QAM signals and 
(g) sample of no-modulation signals 
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Figures 6(a)-(g) present bar charts demonstrating the retrieved FOS attributes for each class. This 
means that the created AMR model is highly likely to classify signals correctly. On the horizontal plane of 
the graph, the FOS is denoted by the mean, standard deviation, variance, skewness, RMS, kurtosis, median, 
and entropy. As illustrated, the pattern for each of the FOS attributes for the different class are distinct, which 
is a vital element for pattern recognition using the AMR technique. The LM and SCG algorithms were used 
to determine the best learning algorithm for the AMR model developed. The number of neurons in the hidden 
layer was varied for each learning algorithm in order to systematically ascertain the number of hidden layer 
neurons that generated low MSE with the highest accuracy. 
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Figure 6. feature bar charts for (a) GMSK sample (b) 8PSK sample (c) QPSK sample (d) BPSK sample (e) 
16-QAM sample (f) 64-QAM sample and (g) noise sample 
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Based on the comparison of the findings, with seventy (70) neurons in the hidden layer, the 
model had the lowest MSE of 0.0131 and the highest accuracy of 93.5 percent when trained using the LM 
algorithm. Additionally, it was observed that when training using LM repeatedly on a predefined number of 
neurons for the hidden layer, the obtained accuracy values were relatively steady and within a very suitable 
range. As for the SCG, however, the levels of accuracy acquired for each predefined number of neurons for 
the hidden layer diverged virtually inexplicably. Figures 7 and 8 represent the best AMR model's confusion 
matrix and ROC curves, respectively, from this study. This model's specifications are shown in Table 4, and 
its topology is shown in Figure 9. The optimal AMR model obtained in this study is based on the compact 
FOS features that will form a CR component for the real-time deployment of NomadicBTS architecture to 
achieve dynamic spectrum sensing [26]. Similar efforts on the use of statistical features and CR for spectrum 
sensing have also been reported in the literature [27]-[29]. 


Output Class 


Target Class 


Figure 7. Confusion matrix for 70 hidden-Layer Figure 8. ROC curves for the 70 hidden-layer neurons 
neurons for the LM-trained model in the LM-trained model 


The accuracy confusion matrix for the proposed model is presented in Figure 7. As shown, most of 
the various modulation signal types are classified correctly with a 93.5% accuracy rate. This result indicates 
that the proposed model demonstrates an acceptable classifying capability for various modulation signals. As 
shown in Figure 8, the area under the curve (AUC) metric was used to evaluate the overall test accuracy of 
the optimal model by plotting the output probabilities based on the ROC methodology for the seven (7) 
different modulation classes. It is generally observed that the model performed satisfactorily well, with 
classes 1, 5, and 7 recording the highest AUC, followed by classes 2, 3, and 4, while class 6 is identified with 
the least AUC. 
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Figure 9. Topology of the best AMR model 


Table 4. Characteristics of the best AMR model 
Characteristics Description 
Number of neurons at the input layer 8 
Number of neurons at the hidden layer 70 
Number of neurons at the output layer 7 


Input layer’s activation function Purelin 

Hidden layer’s activation function Tan-sigmoid 

Output layer’s activation function Tan-sigmoid 

Mean square error (MSE) 0.0131 

Accuracy 93.5% 

Learning algorithm Levenberg-Marquardt (LM) 
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4. CONCLUSION 

Presented here is the development of an AMR-based spectrum sensing model toward the 
implementation of opportunistic spectrum sensing into the NomadicBTS architecture. Real-time over-the-air 
RF datasets were gathered from the experimental setup, including the USRP B200 device and the GRC 
software, and non-complex first-order statistical features were used as descriptors to design the AMR model. 
Selected digital modulation techniques for second-generation (2G) through fourth-generation (4G) 
technologies were evaluated, and the accuracy of the best model was determined. This would inevitably 
improve the identification of spectral holes within the reviewed bands. Complete prototyping of the CR- 
based NomadicBTS architecture (incorporating the AMR model for the fifth generation (5G) mobile 
technologies), interoperability of multiple NomadicBTS for cooperative spectrum sensing, development of 
deep learning-based AMR models with more digital modulation schemes, and prototyping of the architecture 
for other use cases are all exciting areas for further research. 
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